Spatial Interpolation 101: Interpolation in Three Dimensions with Python and IDW algorithm. The Mercury Concentration in Mediterranean Sea.
Move from the 2D interpolation into the 3D interpolation with the Inverse Distance Weighting algorithm.
<< Previous part: Spatial Interpolation 101: Interpolation of Mercury Concentrations in the Mediterranean Sea with IDW and Python
If this is your first article in the series I recommend to read the previous lesson linked above.
Transition from 2D into 3D is not so hard as it seems to be. We can easily adapt algorithm written for the 2-Dimensional IDW to perform the same operation in the n-dimensional space. In this article we are going to adapt our 2D algorithm to the new case of volumetric interpolation as well we will take a look into different styles of visualization of 3-dimensional data. Those are mostly research problems however we will tackle engineering issues too. We will write our first unit test to debug the newly created algorithm. We continue our analysis of the mercury contamination in Mediterranean Sea and we’ll build volumetric profile of the methylmercury (MeHg) concentrations.
Data Sources
- The Mercury Concentrations Dataset in this article comes from: Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; Živković, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA, https://doi.org/10.1594/PANGAEA.899723
- And I’ve found this dataset and the article by the GEOSS search engine: Geoportal: https://www.geoportal.org
- You may check lesson’s code, notebooks, data and shapefiles here: Github repository
Environment Preparation
You should create working environment with four packages:
numpy
for the scientific computation,pandas
for the tabular data processing and analysis,geopandas
for the spatial data processing and analysis,matplotlib
for the data visualization.
Install them from the conda-forge
channel with Python of version from 3.6 or higher. Then download data from publication (csv
file) https://doi.org/10.1594/PANGAEA.899723 along with the Mediterranean coastline shapefile
from my Github (input/italy_coastline.shp).
Algorithm Development
! Code listings in this part are coming from the idw-3d-function.ipynb notebook.
Let’s imagine that you are a Data Scientist and your boss has delegated to you implementation of the algorithm which is not available in the existing Open Source packages. The question is how should you start process of implementation? We can start from the something what we shouldn’t do. We should never implement new statistical algorithms with the real-world data! Real observations are messy; and they rarely follow a easy-to-understand pattern (or distribution, if you are more into statistics). Work related to the data preparation and data cleaning may be tedious and it steals our precious time which could be devoted for the core algorithm. The right path is to create an artificial data of the well-known properties. It is easier to debug and to cover our code with tests based on the fixed dataset which is always the same, or has the same statistical properties every time when we use it. Artificial data may be created on the fly with numpy
, the same package which is used for the math operations. Moreover, we are going to visualize our input data source with matplotlib
package.
Our work has seven steps:
- Generate a set of the random coordinates in a 3-dimensional space.
- Plot the artificial points.
- Write a function which calculates Euclidean distance between the created points and the sampled location with the unknown value.
- Test the distance calculation function.
- Rewrite the IDW function from the previous tutorial.
- Test the IDW function with the prepared data.
- Test the IDW with a generated point cloud, then interpolate unknown values and plot everything to check a result.
Step 1 – Step 2: Generate data for the algorithm development. Plot Points in the 3-Dimensional Space
How to generate the artificial data? You may do it manually but it could be a very time consuming task. The good and simple idea is to use the random numbers generator. With it we are able to create multiple 3D sets of points with random values to test and to build our algorithm. Creation of random set of coordinates in a n-dimensional space is very easy… if we utilize numpy
and its random.randint()
function. The arguments of this function are the lowest possible value and the highest possible value which can be picked, and the size of an output array. Function returns an array of randomly chosen integers. Listing 1 shows the random.randint()
function in action. We generate an array of a random 100 coordinates in the form [x, y, z]. Then we show created points on the plane with the scatter3D()
function implemented in the matplotlib
package.
Listing 1: Algorithm development: random points generation and visualization
# import packages import numpy as np import matplotlib.pyplot as plt # Generate 100 random arrays of 3 integer values between 0 and 1000 random_points = np.random.randint(low=0, high=1000, size=(100, 3)) # Show generated points in 3D scatterplot figure = plt.figure(figsize=(8, 8), facecolor='w') ax = figure.add_subplot(111, projection='3d') ax.scatter3D(random_points[:, 0], random_points[:, 1], random_points[:, 2]) plt.show()
Figure 1 represents the output from the numpy.random.randint()
function. Your realization is probably different because we didn’t set the random seed number which allows the reproducibility of the research. However, if you like to obtain the same points throughout the realization of the program then set the random seed to the constant value. Method for setting up the random seed in the Python program is numpy.random.seed(NUMBER)
.
Step 3 – Step 4: Write the function which calculates Euclidean distance between the selected set of known points and the location with unknown value. Test this function
We can calculate distance between any set of points in any dimension with the nested for
loops but why should we bother with it when we can use the powerful numpy
package and make calculations faster? The idea is to use the array indexing. Each column in array represents different dimension and we can quickly subtract point coordinate from each element in the column, then we can sum those and calculate square root of those sums. Figure 2 shows this operation graphically and Listing 2 is an implementation of this process.
Listing 2: Euclidean distance calculation in n-dimensional space with numpy
def calculate_distances(coordinates, unknown_point): # Get coordinates dimension # Number of columns in numpy array / Number of dimensions coordinates_dim = coordinates.shape[1] distances_between_dims = [] for i in range(coordinates_dim): distance = (coordinates[:, i] - unknown_point[i])**2 distances_between_dims.append(distance) dists_array = np.sqrt(np.sum(distances_between_dims, axis=0)) return dists_array
Variable coordinates_dim
controls the range of a for
loop and by it we understand the number of dimensions of our dataset. Within the loop we are calculating distance between specific dimensions of all known points and given unknown point – we obtain list of lists with distances per dimension. Outside the loop distances are summed along each row (axis=0
) and square root of those sums is a specific distance.
It is good idea to write a unit test to check if our function works as we expected. Unit test requires the hard-coded values or the special files to monitor how our script behaves. Unit tests are very useful when you develop a new package and changes in one part of it may affect functions in the other. To easily find what went wrong we run unit tests and check which one has failed. Listing 3 is the simple unit test of the output of our function. We have the hard-coded 3×3 identity matrix, the 3-element array of 2’s and the expected output which is the 3-element array of 3’s. We expect that distance from the test_arr
to the test_unknown_point
is always equal to 3. (You can check it manually => $\sqrt{(4 + 4 + 1)}$).
Listing 3: Unit test of the distance calculation function
# Test your algorithm with prepared array and unknown point test_arr = np.array([ [1, 0, 0], [0, 1, 0], [0, 0, 1] ]) test_unknown_point = [2, 2, 2] output_distances = np.array([3, 3, 3]) distances = calculate_distances(coordinates=test_arr, unknown_point=test_unknown_point) assert (distances == output_distances).all() # Here is a test
Hopefully our function has passed the test and we are able to move to the next step of an algorithm development.
Step 5: Rewrite IDW function from previous tutorial
In the previous lesson our IDW function was designed only for the two-dimensional data. Let’s recall it (Listing 4).
Listing 4: The two-dimensional IDW function
def inverse_distance_weighting(unknown_point, points, power, ndist=10): distances = calculate_distances(points, unknown_point) points_and_dists = np.c_[points, distances] # Sort and get only 10 values points_and_dists = points_and_dists[points_and_dists[:, -1].argsort()] vals_for_idw = points_and_dists[:ndist, :] # Check if first distance is 0 if vals_for_idw[0, -1] == 0: return vals_for_idw[0, 2] else: # If its not perform calculations weights = 1 / (vals_for_idw[:, -1]**power) numerator = weights * vals_for_idw[:, 2] interpolated_value = np.sum(numerator) / np.sum(weights) return interpolated_value
What must be changed to use this function for the any number of dimensions?
We assume that the function argument points
is an array of coordinates and their respective measurements. As example: [LAT, LON, DEPTH, MERCURY CONCENTRATION]
. We cannot pass all the values for the distance calculation, thus we use numpy’s
indexing to pass only the geographical coordinates and a depth. Instead of:
distances = calculate_distances(points, unknown_point)
We are going to write:
distances = calculate_distances(points[:, :-1], unknown_point)
And instead of:
points_and_dists = np.c_[points, distances]
We change it to:
vals_and_dists = np.c_[points[:, -1], distances]
In the first change we pass only the coordinates to the calculate_distances()
function. The second change is analogical to the first one but this time we pass all rows and only the last column to the new array of the observed values and distances between them and the point of unknown value.
Finally we get an array with two elements: [known point value, distance between known point and unknown point]
. As you may noticed we’ve changed variable name from points_and_dists
to vals_and_dists
. It is a naming convention derived from the clean code: the variable name should be descriptive and it definitively shouldn’t be misleading.
The sorting part doesn’t change. We must check if the first value in a sorted list is different than 0. From this point the main difference is related to the array indexing which you may see in the code Listing 5 with a full implementation of this function. The main problem is a distance calculation function. If we build it to work with the n-dimensional data then our IDW function will work without any problems.
Listing 5: Multidimensional IDW function
def inverse_distance_weighting(unknown_point, points, power, ndist=10): """ Function estimates values in unknown points with with inverse weighted interpolation technique. INPUT: :param unknown_point: coordinates of unknown point, :param points: (array) list of points and they values [dim 1, ..., dim n, val], :param power: (float) constant used to calculate IDW weight -> weight = 1/(distance**power), :param ndist: (int) how many closest distances are included in weighting, OUTPUT: :return interpolated_value: (float) interpolated value by IDW method. Inverse distance weighted interpolation is: est = SUM(WEIGHTS * KNOWN VALS) / SUM(WEIGHTS) and WEIGHTS = 1 / (DISTANCE TO UNKNOWN**power) where: power is a constant hyperparameter which tells how much point is influenced by other points. """ distances = calculate_distances(points[:, :-1], unknown_point) vals_and_dists = np.c_[points[:, -1], distances] # Sort and get only 10 values vals_and_dists = vals_and_dists[vals_and_dists[:, 1].argsort()] vals_for_idw = vals_and_dists[:ndist, :] # Check if first distance is 0 if vals_for_idw[0, 1] == 0: # If first distance is 0 then return first value return vals_for_idw[0, 0] else: # If it's not perform calculations weights = 1 / (vals_for_idw[:, 1]**power) numerator = weights * vals_for_idw[:, 0] interpolated_value = np.sum(numerator) / np.sum(weights) return interpolated_value
Step 6: Test IDW function
As we’ve tested the distance calculation, we are going to test IDW algorithm too. The idea is the same, we create the hard-coded arrays and check if the output is the same as it should be based on the manual calculations. Listing 6 is the test for it.
Listing 6: Unit test of Inverse Distance Weighting function
test_arr = np.array([ [1, 0, 0, 1], [0, 1, 0, 1], [0, 0, 1, 1] ]) test_unknown_point = [2, 2, 2] output_est = 1 estimated_value = inverse_distance_weighting(test_unknown_point, test_arr, 2) assert output_est == estimated_value
Step 7: Test IDW with artificial data
To close the part with a function development we will test it with the randomly generated points. We should create a evenly spaced set of points in the 3-dimensional space (Figure 3). We can achieve it within numpy
with two operations: the coordinates generation and their positioning (Listing 7).
Listing 7: Function for creation of evenly spaced points cloud
def generate_coordinates(lower, upper, step_size, dimension): coordinates = [np.arange(lower, upper, step_size) for i in range(0, dimension)] mesh = np.hstack(np.meshgrid(*coordinates)).swapaxes(0, 1).reshape(dimension, -1).T return mesh coords = generate_coordinates(0, 101, 10, 3)
The method from the Listing 7 uses numpy.arange()
function to generate an array with coordinates from the lower to the upper limit and then it uses numpy.meshgrid()
to create all possible realizations (permutations) of the points in a space of size controlled by a given dimension (Figure 3). Unfortunately for us, all of those points are only coordinates without any value. We must generate sample of random points in the same space, but this time with assigned values. This could be done with function from Listing 1. Only difference is the number of records in a row, then it was three (x, y, z)
and now we need four (x, y, z, value)
. One realization of this function is presented in the Listing 8 and Figure 4 shows the generated set of points.
Listing 8: Random sample of values in 3D space.
# Generate points with values known_points = np.random.randint(0, 101, size=(20, 4)) # Show known points figure = plt.figure(figsize=(8, 8), facecolor='w') ax = figure.add_subplot(111, projection='3d') p = ax.scatter3D(known_points[:, 0], known_points[:, 1], known_points[:, 2], c=known_points[:, -1], cmap='viridis') figure.colorbar(p) plt.show()
The result of our work up to this moment is a 3D canvas of evenly spaced points and few samples for interpolation. We are able to perform interpolation and to fill the canvas with values. We will store output as the new numpy
array of rows with [x, y, z, interpolated value]
and to do so, we are going to iterate through each record of the empty canvas. Listing 10 shows function which performs this process and Figure 5 presents the resulting grid.
Listing 9: Interpolate values from canvas.
def interpolate_values(array_of_coordinates, array_of_known_points): output_arr = [] for row in array_of_coordinates: interpolated = inverse_distance_weighting(row, array_of_known_points, power=3, ndist=5) interpol_arr = np.zeros(shape=(1, len(row) + 1)) interpol_arr[:, :-1] = row interpol_arr[:, -1] = interpolated output_arr.append(interpol_arr[0]) return np.array(output_arr) interp = interpolate_values(coords, known_points) # Show interpolation results figure = plt.figure(figsize=(12, 12), facecolor='w') ax = figure.add_subplot(111, projection='3d') p = ax.scatter3D(interp[:, 0], interp[:, 1], interp[:, 2], c=interp[:, -1], cmap='viridis') figure.colorbar(p) plt.show()
Congratulations! We’ve created the IDW algorithm for interpolation of the points values in the 3D space! Finally, we are going to work with the real world data!
Data Preparation
! Code listings in this part are coming from the idw-3d-and-mercury-concentrations-data-prep.ipynb notebook.
We are using similar procedure of a data preparation as in the previous tutorial. The significant difference is that we are going to use all dates of measurements and we limit the extent to the coastline of Italy. Download data from here and open new Jupyter Notebook. Import pandas
and read csv
file with observations with pandas.read_csv()
method. Create a copy of a DataFrame with columns [‘lon’, ‘lat’, ‘depth_m’, ‘mea_ug_kg_orig’]
which are: longitude, latitude, depth, mean concentration of Hg in the sampled organisms (Listing 10).
Listing 10: Read and prepare DataFrame with observations
import pandas as pd # Read dataframe df = pd.read_csv('input/190516_m2b_marine_biota.csv', sep=';', encoding = "ISO-8859-1", index_col='id') # Get only columns: ['lon', 'lat', 'depth_m', 'mea_ug_kg_orig'] cols = ['lon', 'lat', 'depth_m', 'mea_ug_kg_orig'] ndf = df[cols]
As we decided earlier we limit the study extent to the Italian coast and to the maximum sampling depth of 20 meters. Study extent is defined by the extremities: 35.3N – 47.05N; 6.37E – 18.31E. With numpy
array indexing we are able to limit area of the study to the specific area within a given coordinates (Listing 11).
Listing 11: Defining study extent with pandas DataFrame indexing
# Limit data by latitude and longitude and depth # Latitude limits: 35.3N - 47.05N # Longitude limits: 6.37E - 18.31E # Depth limit: 0m - 20m lat_limit_bottom = ndf['lat'] > 35.3 lat_limit_top = ndf['lat'] < 47.05 lon_limit_bottom = ndf['lon'] > 6.37 lon_limit_top = ndf['lon'] < 18.31 depth_limit_top = ndf['depth_m'] >= 0 # inverted! depth_limit_bottom = ndf['depth_m'] < 20 ndf = ndf[lat_limit_bottom & lat_limit_top & lon_limit_bottom & lon_limit_top & depth_limit_bottom & depth_limit_top]
Created DataFrame
has 2387 records and we assume that it is enough for the analysis. We should store transformed data and it can be done with the method pandas.to_csv()
. In this tutorial we’ll name output file as the prepared_data_mercury_concentrations_volumetric.csv
. You can download this file from the tutorial repository if you’ve skipped the step of a data prepration.
Analysis
! Code listings in this part are coming from the idw-3d-analysis.ipynb notebook.
At the beginning of this lesson we’ve developed tools for data analysis. Then we’ve prepared dataset. Now it’s time for something far more interesting. Finally we are going to join our materials (data) and tools (algorithms) to perform spatial analysis. Our work here is straightforward with one exception at the end of the project. This part of the project requires import of a standard Spatial Data Science libraries (pandas
, geopandas
, numpy
) and matplotlib.pyplot
for the visualization. We’ll import ScalarMappable
class from matplotlib.cm
and use it in the last part of the exercise for the colorbar plotting.
Start with loading of:
prepared_data_mercury_concentrations_volumetric.csv
file withpandas
,italy_coastline.shp
withgeopandas
(Listing 12).
Listing 12: Import packages and load data for analysis
import pandas as pd import geopandas as gpd import numpy as np import matplotlib.pyplot as plt from matplotlib.cm import ScalarMappable # 1. Read data. df = pd.read_csv('prepared_data_mercury_concentrations_volumetric.csv', sep=',', encoding='utf8', index_col='id') gdf = gpd.read_file('input/italy_coastline.shp') gdf.set_index('id', inplace=True)
The good idea is to check our DataFrame
before we start an analysis. The mercury concentrations table should look like:
id | lon | lat | depth_m | mea_ug_kg_orig |
1169 | 12.5 | 44.9 | 5.0 | 114.0 |
9577 | 14.0 | 43.8 | 10.0 | 22.0 |
DataFrame
with Mercury concentrations in Mediterranean Sea organisms.With geopandas
we are able to plot our GeoDataFrame
and check visually if everything is ok with the geometry. Use for it gdf.plot()
method. We may inspect geometry type too, just need to print geometry attribute of the GeoDataFrame
with gdf.geometry
command. In our case it is the MultiPoint
type.
The hardest part: build the volumetric profile of the coastline and perform inverse distance weighting. For the sake of simplicity, we skip the real depths and we build five depth levels for each point (0, 5, 10, 15, 20 meters). We can create the new DataFrame
with columns for each level, but there is an option to create those profiles when interpolation algorithms runs, so we will use this opportunity. Now it is hard to imagine how will we do it, but Figure 6 shows algorithm to make it clearer. We have to write four functions to achieve the final result. Three of them already exist and those are: calculate_distances()
from Listing 2, inverse_distance_weighting()
from Listing 5 and interpolate_values()
from Listing 9 which will be slightly changed for the needs of our process.
Algorithm for 3D IDW is complicated but complexity comes from the depth profile building. Without this step, as example if we provide the pre-defined canvas, then process is straightforward. Look into list below, the italic paragraphs are the algorithm without the depth profile building and the bold paragraph describes place of the additional step of the depth profile creation:
- iterate through each point where you want to interpolate values (here it is the
Geometry
object from the Figure 6), - build depth profiles for the specific unknown point and return array with a series with points and the specific depths, as example:
[ [x, y, 0], [x, y, 5], [x, y, 10] ]
where we have three depth levels, - perform inverse distance weighting on those points with the array containing points of the known values (here it is the
Points
object from the Figure 6), - assign a interpolated value to the point and return it.
All values should be passed into the interpolate_values()
function (Listing 13 is a changed function). Then, in the loop, each unknown point from canvas is transferred to the build_depth_profile()
function (Listing 14). From there the array with depth profiles is returned again to interpolate_values()
and for each depth of this array inverse distance weighting is applied to the canvas points. Then the interpolated value is stored in the array. When all points are interpolated then output array is returned, and we can visualize it.
Listing 13: Interpolate values, updated function
def interpolate_values(array_of_coordinates, array_of_known_points): output_arr = [] for row in array_of_coordinates: pts = build_volumetric_profile_array(row[0]) # Here is an update for pt in pts: interpolated = inverse_distance_weighting(pt, array_of_known_points, power=3, ndist=5) interpol_arr = np.zeros(shape=(1, len(pt) + 1)) interpol_arr[:, :-1] = pt interpol_arr[:, -1] = interpolated output_arr.append(interpol_arr[0]) return np.array(output_arr)
Listing 14: Build depth profile
def build_volumetric_profile_array(points): """Function adds depth levels to the coastline geometry""" ranges = np.arange(0, 21, 5) mpts = [] for r in ranges: vals = [points[0].x, points[0].y, r] mpts.append(vals) return np.array(mpts)
Now we move to the very interesting part of the data visualization in multiple dimensions. We have done it before (Listing 9) and we will do it again. Here’s a bit of advice: our depth profiles are positive integers so our visualization will be inverted (deepest part will be at the top). To change this behavior you can modify function from Listing 14 to produce negative values, or you can invert axis in matplotlib
. To do the latter we use ax.invert_zaxis()
method (Figure 7).
Ok… there are some patterns but unfortunately plot is not readable! It is hard to distinguish specific points and values!
Three or more dimensional data visualization is tricky. The good idea is to change plot to the series of 2-dimensional scatter plots to show how values are changed within the depth. This is not so easy task and I was struck with it when I wanted to plot colorbar
. Fortunately, there is a method to do it. We’ve mentioned earlier that we must import ScallarMappable
class from matplotlib
. Here’s the use of it. We are going to build a colorbar
based on the assumptions that the minimum concentration of MgHg is 0 and maximum is 30 000. Whole process is presented in the Listing 15.
Listing 15: Plot multiple depth profiles as a series of plots
# Get unique depths unique_depths = np.unique(interpolated[:, 2]) # Prepare figures scale = np.linspace(0, 30000, 300) cmap = plt.get_cmap('viridis') norm = plt.Normalize(scale.min(), scale.max()) sm = ScalarMappable(norm=norm, cmap=cmap) sm.set_array([]) fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 35)) for idx, depth_level in enumerate(unique_depths): x = interpolated[:, 0][interpolated[:, 2] == depth_level] y = interpolated[:, 1][interpolated[:, 2] == depth_level] col = interpolated[:, -1][interpolated[:, 2] == depth_level] axes[idx].scatter(x, y, c=col, vmin=0, vmax=30000, cmap='viridis') axes[idx].set_title('Mercury concentrtations at depth: {} meters'.format(depth_level)) fig.colorbar(sm, ax=axes[idx])
Figure 8 is better for comparison between the depths. This teaches us that it is better to distribute data among multiple plots if it will be analyzed by the human-expert. The same approach may be used for the time-series presentation or other 3-dimensional datasets.
Ok, that’s all. We have created full algorithm for the3D IDW and now we are able to perform IDW in a n-dimensional space. Additionally, we know how to plot a 3D data to preserve meaningful information and reduce noise related to the overlapping points. Listing 16 is a full algorithm to perform this kind of analysis. Feel free to use it and share it!
Listing 16: Full code for 3D IDW interpolation
import pandas as pd import geopandas as gpd import numpy as np import matplotlib.pyplot as plt from matplotlib.cm import ScalarMappable # 1. Read data. # 2. Check data. # 3. Prepare model(geometry). # 4. Interpolate missing values. # 5. Visualize output. # 1. Read data. df = pd.read_csv('prepared_data_mercury_concentrations_volumetric.csv', sep=',', encoding='utf8', index_col='id') gdf = gpd.read_file('input/italy_coastline.shp') gdf.set_index('id', inplace=True) # 3. Prepare model(geometry). # For each point add depth profile - create new dataframe with those points def build_volumetric_profile_array(points): """Function adds depth levels to the coastline geometry""" ranges = np.arange(0, 21, 5) mpts = [] for r in ranges: vals = [points[0].x, points[0].y, r] mpts.append(vals) return np.array(mpts) # 4. Interpolate missing values. # 4. Calculate distances between 3D coordinates list and unknown point. Use numpy array operations. def calculate_distances(coordinates, unknown_point): # Get coordinates dimension coordinates_dim = coordinates.shape[1] # Number of columns in numpy array / Number of dimensions distances_between_dims = [] for i in range(coordinates_dim): distance = (coordinates[:, i] - unknown_point[i])**2 distances_between_dims.append(distance) dists_array = np.sqrt(np.sum(distances_between_dims, axis=0)) return dists_array def inverse_distance_weighting(unknown_point, points, power, ndist=10): """ Function estimates values in unknown points with with inverse weighted interpolation technique. INPUT: :param unknown_point: coordinates of unknown point, :param points: (array) list of points and they values [dim 1, ..., dim n, val], :param power: (float) constant used to calculate IDW weight -> weight = 1/(distance**power), :param ndist: (int) how many closest distances are included in weighting, OUTPUT: :return interpolated_value: (float) interpolated value by IDW method. Inverse distance weighted interpolation is: est = SUM(WEIGHTS * KNOWN VALS) / SUM(WEIGHTS) and WEIGHTS = 1 / (DISTANCE TO UNKNOWN**power) where: power is a constant hyperparameter which tells how much point is influenced by other points. """ distances = calculate_distances(points[:, :-1], unknown_point) vals_and_dists = np.c_[points[:, -1], distances] # Sort and get only 10 values vals_and_dists = vals_and_dists[vals_and_dists[:, 1].argsort()] vals_for_idw = vals_and_dists[:ndist, :] # Check if first distance is 0 if vals_for_idw[0, 1] == 0: # If first distance is 0 then return first value return vals_for_idw[0, 0] else: # If it's not perform calculations weights = 1 / (vals_for_idw[:, 1]**power) numerator = weights * vals_for_idw[:, 0] interpolated_value = np.sum(numerator) / np.sum(weights) return interpolated_value def interpolate_values(array_of_coordinates, array_of_known_points): output_arr = [] for row in array_of_coordinates: pts = build_volumetric_profile_array(row[0]) for pt in pts: interpolated = inverse_distance_weighting(pt, array_of_known_points, power=3, ndist=5) interpol_arr = np.zeros(shape=(1, len(pt) + 1)) interpol_arr[:, :-1] = pt interpol_arr[:, -1] = interpolated output_arr.append(interpol_arr[0]) return np.array(output_arr) geometry_array = gdf.to_numpy() interpolated = interpolate_values(geometry_array, df.to_numpy()) # 5. Visualize output. # Get unique depths unique_depths = np.unique(interpolated[:, 2]) # Prepare figures scale = np.linspace(0, 30000, 300) cmap = plt.get_cmap('viridis') norm = plt.Normalize(scale.min(), scale.max()) sm = ScalarMappable(norm=norm, cmap=cmap) sm.set_array([]) fig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 35)) for idx, depth_level in enumerate(unique_depths): x = interpolated[:, 0][interpolated[:, 2] == depth_level] y = interpolated[:, 1][interpolated[:, 2] == depth_level] col = interpolated[:, -1][interpolated[:, 2] == depth_level] axes[idx].scatter(x, y, c=col, vmin=0, vmax=30000, cmap='viridis') axes[idx].set_title('Mercury concentrtations at depth: {} meters'.format(depth_level)) fig.colorbar(sm, ax=axes[idx])
Exercises:
- Refactor the code and create a class for the IDW calculation. This class should have all functions (methods) from the lesson. Think about the class arguments (parameters) which should be included during the object initialization.
- Think how to visualize a 4-dimensional dataset. As example
[lon, lat, depth, time]
? Is there a good way to do it? What are the trade-offs of each way? - There is one particular problem related to our analysis: we assume that the depth has the same spatial dimension as the lon/lat coordinates and this is not true. How to enhance algorithm to support different scales? (We will answer this question in the future, but feel free to experiment with this issue).
Next Lesson
Bibliography
[1] Health Effects of Exposures to Mercury. United States Environmental Protection Agency. Link: https://www.epa.gov/mercury/health-effects-exposures-mercury
[2] EPA-FDA Fish Advice: Technical Information. United States Environmental Protection Agency. Link: https://www.epa.gov/fish-tech/epa-fda-fish-advice-technical-information
[3] Github with article’s notebooks and shapefiles
[4] Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; Živković, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA, https://doi.org/10.1594/PANGAEA.899723
Changelog
- 2021-07-18: The first release.